Detection Transformer (DETR) directly transforms queries to unique objects by using one-to-one bipartite matching during training and enables end-to-end object detection. Recently, these models have surpassed traditional detectors on COCO with undeniable elegance. However, they differ from traditional detectors in multiple designs, including model architecture and training schedules, and thus the effectiveness of one-to-one matching is not fully understood. In this work, we conduct a strict comparison between the one-to-one Hungarian matching in DETRs and the one-to-many label assignments in traditional detectors with non-maximum supervision (NMS). Surprisingly, we observe one-to-many assignments with NMS consistently outperform standard one-to-one matching under the same setting, with a significant gain of up to 2.5 mAP. Our detector that trains Deformable-DETR with traditional IoU-based label assignment achieved 50.2 COCO mAP within 12 epochs (1x schedule) with ResNet50 backbone, outperforming all existing traditional or transformer-based detectors in this setting. On multiple datasets, schedules, and architectures, we consistently show bipartite matching is unnecessary for performant detection transformers. Furthermore, we attribute the success of detection transformers to their expressive transformer architecture. Code is available at https://github.com/jozhang97/DETA.
translated by 谷歌翻译
由于锥形光束计算机断层扫描(CBCT)图像的三维(3D)单个齿的准确和自动分割是一个具有挑战性的问题,因为难以将个体齿与相邻齿和周围的肺泡骨分开。因此,本文提出了一种从牙科CBCT图像识别和分割3D个体齿的全自动方法。所提出的方法通过开发基于深度学习的分层多步模型来解决上述难度。首先,它自动生成上下钳口全景图像,以克服由高维数据和与有限训练数据集相关的维度的诅咒引起的计算复杂度。然后使用所获得的2D全景图像来识别2D单独的牙齿并捕获3D个体齿的兴趣和紧密区域(ROIS)。最后,使用松动和紧密的ROI实现了精确的3D个体齿分割。实验结果表明,牙齿识别的牙齿识别的F1分数为93.35%,对于各个3D齿分割,骰子相似度系数为94.79%。结果表明,该方法为数字牙科提供了有效的临床和实用框架。
translated by 谷歌翻译
Figure 1. An illustration of standard knowledge distillation. Despite widespread use, an understanding of when the student can learn from the teacher is missing.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Robotics has been widely applied in smart construction for generating the digital twin or for autonomous inspection of construction sites. For example, for thermal inspection during concrete curing, continual monitoring of the concrete temperature is required to ensure concrete strength and to avoid cracks. However, buildings are typically too large to be monitored by installing fixed thermal cameras, and post-processing is required to compute the accumulated heat of each measurement point. Thus, by using an autonomous monitoring system with the capability of long-term thermal mapping at a large construction site, both cost-effectiveness and a precise safety margin of the curing period estimation can be acquired. Therefore, this study proposes a low-cost thermal mapping system consisting of a 2D range scanner attached to a consumer-level inertial measurement unit and a thermal camera for automated heat monitoring in construction using mobile robots.
translated by 谷歌翻译
顺序推荐系统通过捕获用户的兴趣漂移来显示有效的建议。有两组现有的顺序模型:以用户和项目为中心的模型。以用户为中心的模型根据每个用户的顺序消费历史记录来捕获个性化的利息漂移,但没有明确考虑用户对项目的利益是否超出培训时间,即利息可持续性。另一方面,以项目为中心的模型考虑了用户在培训时间后的一般利益是否维持,但不是个性化的。在这项工作中,我们提出了一个推荐系统,将两类模型的优势占据优势。我们提出的模型捕获了个性化的利息可持续性,表明每个用户对物品的利益是否会超出培训时间。我们首先制定一项任务,该任务需要根据用户的消费历史记录预测培训时间中每个用户将消耗哪些项目。然后,我们提出简单而有效的方案,以增强用户的稀疏消费历史记录。广泛的实验表明,所提出的模型在11个现实世界数据集上的表现优于10个基线模型。这些代码可在https://github.com/dmhyun/peris上找到。
translated by 谷歌翻译
开放式识别(OSR)假设未知实例在推理时间出现在蓝色中。 OSR的主要挑战是,模型对未知数的响应是完全无法预测的。此外,由于实例的难度级别不同,因此开放式设置的多样性使情况变得更加困难。因此,我们提出了一个新颖的框架,难以感知的模拟器(DIAS),该框架产生了具有不同难度水平的假货来模拟现实世界。我们首先在分​​类器的角度研究了生成对抗网络(GAN)的假货,并观察到这些伪造并不具有挑战性。这使我们通过对具有中等难题的甘恩产生的样品来定义难度的标准。为了产生难题的示例,我们介绍模仿者,模仿分类器的行为。此外,我们的修改后的gan和模仿者也分别产生了中等和易于缺陷的样品。结果,DIAS的表现优于AUROC和F-SCORE指标的最先进方法。我们的代码可在https://github.com/wjun0830/difficulty-aware-simulator上找到。
translated by 谷歌翻译
半监督视频对象细分(VOS)旨在密集跟踪视频中的某些指定对象。该任务中的主要挑战之一是存在与目标对象相似的背景干扰物的存在。我们提出了三种抑制此类干扰因素的新型策略:1)一种时空多元化的模板构建方案,以获得目标对象的广义特性; 2)可学习的距离得分函数,可通过利用两个连续帧之间的时间一致性来排除空间距离的干扰因素; 3)交换和连接的扩展通过提供包含纠缠对象的训练样本来迫使每个对象具有独特的功能。在所有公共基准数据集中,即使是实时性能,我们的模型也与当代最先进的方法相当。定性结果还证明了我们的方法优于现有方法。我们认为,我们的方法将被广泛用于未来的VOS研究。
translated by 谷歌翻译
自我监督学习的共同研究目标是提取一般表示,任意下游任务将受益。在这项工作中,我们调查了从不同的对比度自学学习方案中学到的音乐音频表示形式,并在各种音乐信息检索(MIR)任务上对嵌入式矢量进行了经验评估,在这些任务中,音乐感知的不同级别。我们分析结果,以讨论针对不同MIR任务的对比度学习策略的正确方向。我们表明,这些表示形式传达了有关音乐一般的听觉特征的全面信息,尽管每种自学策略在信息的某些方面都有其自身的有效性。
translated by 谷歌翻译